Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Epidemics ; 39: 100576, 2022 06.
Article in English | MEDLINE | ID: covidwho-1851042

ABSTRACT

The SARS-CoV-2 pandemic led to a huge increase in global pathogen genome sequencing efforts, and the resulting data are becoming increasingly important to detect variants of concern, monitor outbreaks, and quantify transmission dynamics. However, this rapid up-scaling in data generation brought with it many IT infrastructure challenges. In this paper, we report about developing an improved system for genomic epidemiology. We (i) highlight key challenges that were exacerbated by the pandemic situation, (ii) provide data infrastructure design principles to address them, and (iii) give an implementation example developed by the Swiss SARS-CoV-2 Sequencing Consortium (S3C) in response to the COVID-19 pandemic. Finally, we discuss remaining challenges to data infrastructure for genomic epidemiology. Improving these infrastructures will help better detect, monitor, and respond to future public health threats.


Subject(s)
COVID-19 , Computational Biology/statistics & numerical data , Genomics , Pandemics , SARS-CoV-2/genetics , COVID-19/epidemiology , Computational Biology/trends , Humans , Molecular Sequence Data , Switzerland/epidemiology
2.
J Biomed Semantics ; 12(1): 13, 2021 07 18.
Article in English | MEDLINE | ID: covidwho-1484319

ABSTRACT

BACKGROUND: Effective response to public health emergencies, such as we are now experiencing with COVID-19, requires data sharing across multiple disciplines and data systems. Ontologies offer a powerful data sharing tool, and this holds especially for those ontologies built on the design principles of the Open Biomedical Ontologies Foundry. These principles are exemplified by the Infectious Disease Ontology (IDO), a suite of interoperable ontology modules aiming to provide coverage of all aspects of the infectious disease domain. At its center is IDO Core, a disease- and pathogen-neutral ontology covering just those types of entities and relations that are relevant to infectious diseases generally. IDO Core is extended by disease and pathogen-specific ontology modules. RESULTS: To assist the integration and analysis of COVID-19 data, and viral infectious disease data more generally, we have recently developed three new IDO extensions: IDO Virus (VIDO); the Coronavirus Infectious Disease Ontology (CIDO); and an extension of CIDO focusing on COVID-19 (IDO-COVID-19). Reflecting the fact that viruses lack cellular parts, we have introduced into IDO Core the term acellular structure to cover viruses and other acellular entities studied by virologists. We now distinguish between infectious agents - organisms with an infectious disposition - and infectious structures - acellular structures with an infectious disposition. This in turn has led to various updates and refinements of IDO Core's content. We believe that our work on VIDO, CIDO, and IDO-COVID-19 can serve as a model for yielding greater conformance with ontology building best practices. CONCLUSIONS: IDO provides a simple recipe for building new pathogen-specific ontologies in a way that allows data about novel diseases to be easily compared, along multiple dimensions, with data represented by existing disease ontologies. The IDO strategy, moreover, supports ontology coordination, providing a powerful method of data integration and sharing that allows physicians, researchers, and public health organizations to respond rapidly and efficiently to current and future public health crises.


Subject(s)
Biological Ontologies/statistics & numerical data , COVID-19/prevention & control , Communicable Disease Control/statistics & numerical data , Communicable Diseases/therapy , Computational Biology/statistics & numerical data , SARS-CoV-2/isolation & purification , COVID-19/epidemiology , COVID-19/virology , Communicable Disease Control/methods , Communicable Diseases/epidemiology , Communicable Diseases/transmission , Computational Biology/methods , Data Mining/methods , Data Mining/statistics & numerical data , Epidemics , Humans , Information Dissemination/methods , Public Health/methods , Public Health/statistics & numerical data , SARS-CoV-2/physiology , Semantics
3.
Nucleic Acids Res ; 49(D1): D266-D273, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1387962

ABSTRACT

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Protein/statistics & numerical data , Protein Domains , Proteins/chemistry , Amino Acid Sequence , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Epidemics , Humans , Internet , Molecular Sequence Annotation , Proteins/genetics , Proteins/metabolism , SARS-CoV-2/genetics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Viral Proteins/chemistry , Viral Proteins/genetics , Viral Proteins/metabolism
4.
Nucleic Acids Res ; 49(D1): D92-D96, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1387961

ABSTRACT

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 9.9 trillion base pairs from over 2.1 billion nucleotide sequences for 478 000 formally described species. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. Recent updates include new resources for data from the SARS-CoV-2 virus, updates to the NCBI Submission Portal and associated submission wizards for dengue and SARS-CoV-2 viruses, new taxonomy queries for viruses and prokaryotes, and simplified submission processes for EST and GSS sequences.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Nucleic Acid , Genomics/methods , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , Animals , COVID-19/epidemiology , COVID-19/virology , Computational Biology/methods , Humans , Information Storage and Retrieval/methods , Internet , Molecular Sequence Annotation/methods , Pandemics
5.
Nucleic Acids Res ; 49(D1): D261-D265, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1387959

ABSTRACT

ADP-ribosylation is a protein modification responsible for biological processes such as DNA repair, RNA regulation, cell cycle and biomolecular condensate formation. Dysregulation of ADP-ribosylation is implicated in cancer, neurodegeneration and viral infection. We developed ADPriboDB (adpribodb.leunglab.org) to facilitate studies in uncovering insights into the mechanisms and biological significance of ADP-ribosylation. ADPriboDB 2.0 serves as a one-stop repository comprising 48 346 entries and 9097 ADP-ribosylated proteins, of which 6708 were newly identified since the original database release. In this updated version, we provide information regarding the sites of ADP-ribosylation in 32 946 entries. The wealth of information allows us to interrogate existing databases or newly available data. For example, we found that ADP-ribosylated substrates are significantly associated with the recently identified human protein interaction networks associated with SARS-CoV-2, which encodes a conserved protein domain called macrodomain that binds and removes ADP-ribosylation. In addition, we create a new interactive tool to visualize the local context of ADP-ribosylation, such as structural and functional features as well as other post-translational modifications (e.g. phosphorylation, methylation and ubiquitination). This information provides opportunities to explore the biology of ADP-ribosylation and generate new hypotheses for experimental testing.


Subject(s)
Adenosine Diphosphate Ribose/metabolism , Computational Biology/statistics & numerical data , Databases, Protein/statistics & numerical data , Proteins/metabolism , ADP-Ribosylation , Binding Sites , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Humans , Protein Domains , Protein Processing, Post-Translational , Proteins/chemistry , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Viral Proteins/chemistry , Viral Proteins/metabolism
6.
J Biomed Semantics ; 12(1): 15, 2021 08 09.
Article in English | MEDLINE | ID: covidwho-1350153

ABSTRACT

BACKGROUND: The ontology authoring step in ontology development involves having to make choices about what subject domain knowledge to include. This may concern sorting out ontological differences and making choices between conflicting axioms due to limitations in the logic or the subject domain semantics. Examples are dealing with different foundational ontologies in ontology alignment and OWL 2 DL's transitive object property versus a qualified cardinality constraint. Such conflicts have to be resolved somehow. However, only isolated and fragmented guidance for doing so is available, which therefore results in ad hoc decision-making that may not be the best choice or forgotten about later. RESULTS: This work aims to address this by taking steps towards a framework to deal with the various types of modeling conflicts through meaning negotiation and conflict resolution in a systematic way. It proposes an initial library of common conflicts, a conflict set, typical steps toward resolution, and the software availability and requirements needed for it. The approach was evaluated with an actual case of domain knowledge usage in the context of epizootic disease outbreak, being avian influenza, and running examples with COVID-19 ontologies. CONCLUSIONS: The evaluation demonstrated the potential and feasibility of a conflict resolution framework for ontologies.


Subject(s)
Biological Ontologies/statistics & numerical data , Computational Biology/statistics & numerical data , Information Storage and Retrieval/statistics & numerical data , Semantic Web , Semantics , Vocabulary, Controlled , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Databases, Factual/statistics & numerical data , Epidemics/prevention & control , Humans , Information Storage and Retrieval/methods , Logic , SARS-CoV-2/physiology
7.
Nucleic Acids Res ; 49(D1): D29-D37, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-947664

ABSTRACT

The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effectively tackle a global health crisis. EMBL-EBI has been able to build on its position to contribute to the fight against COVID-19 in a number of ways. Firstly, EMBL-EBI has used its infrastructure, expertise and network of international collaborations to help build the European COVID-19 Data Platform (https://www.covid19dataportal.org/), which brings together COVID-19 biomolecular data and connects it to researchers, clinicians and public health professionals. By September 2020, the COVID-19 Data Platform has integrated in excess of 170 000 COVID-19 biomolecular data and literature records, collected through a number of EMBL-EBI resources. Secondly, EMBL-EBI has strived to continue its support of the life science communities through the crisis, with updated Training provision and improved service provision throughout its resources. The COVID-19 pandemic has highlighted the importance of EMBL-EBI's core principles, including international cooperation, resource sharing and central data brokering, and has further empowered scientific cooperation.


Subject(s)
COVID-19/prevention & control , Computational Biology/statistics & numerical data , Databases, Nucleic Acid/statistics & numerical data , Information Storage and Retrieval/methods , SARS-CoV-2/genetics , Viral Proteins/genetics , COVID-19/epidemiology , COVID-19/virology , Computational Biology/methods , Computational Biology/organization & administration , Databases, Nucleic Acid/organization & administration , Global Health , Humans , Information Storage and Retrieval/statistics & numerical data , Internet , Pandemics , SARS-CoV-2/metabolism , SARS-CoV-2/physiology , Viral Proteins/metabolism
8.
Nucleic Acids Res ; 49(D1): D412-D419, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-894614

ABSTRACT

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Protein , Proteins/metabolism , Proteome/metabolism , Animals , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Epidemics , Humans , Internet , Models, Molecular , Protein Structure, Tertiary , Proteins/chemistry , Proteins/genetics , Proteome/classification , Proteome/genetics , Repetitive Sequences, Amino Acid/genetics , SARS-CoV-2/genetics , SARS-CoV-2/physiology , Sequence Analysis, Protein/methods
9.
Nucleic Acids Res ; 49(D1): D183-D191, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-873045

ABSTRACT

RNA molecules fold into complex structures that are important across many biological processes. Recent technological developments have enabled transcriptome-wide probing of RNA secondary structure using nucleases and chemical modifiers. These approaches have been widely applied to capture RNA secondary structure in many studies, but gathering and presenting such data from very different technologies in a comprehensive and accessible way has been challenging. Existing RNA structure probing databases usually focus on low-throughput or very specific datasets. Here, we present a comprehensive RNA structure probing database called RASP (RNA Atlas of Structure Probing) by collecting 161 deduplicated transcriptome-wide RNA secondary structure probing datasets from 38 papers. RASP covers 18 species across animals, plants, bacteria, fungi, and also viruses, and categorizes 18 experimental methods including DMS-seq, SHAPE-Seq, SHAPE-MaP, and icSHAPE, etc. Specially, RASP curates the up-to-date datasets of several RNA secondary structure probing studies for the RNA genome of SARS-CoV-2, the RNA virus that caused the on-going COVID-19 pandemic. RASP also provides a user-friendly interface to query, browse, and visualize RNA structure profiles, offering a shortcut to accessing RNA secondary structures grounded in experimental data. The database is freely available at http://rasp.zhanglab.net.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Genetic/statistics & numerical data , High-Throughput Nucleotide Sequencing/statistics & numerical data , Nucleic Acid Conformation , RNA/chemistry , Transcriptome , Animals , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19/virology , Computational Biology/methods , Genome, Viral/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , Pandemics , RNA/genetics , RNA Probes/genetics , RNA, Bacterial/chemistry , RNA, Bacterial/genetics , RNA, Fungal/chemistry , RNA, Fungal/genetics , RNA, Plant/chemistry , RNA, Plant/genetics , RNA, Viral/chemistry , RNA, Viral/genetics , SARS-CoV-2/genetics , SARS-CoV-2/physiology
10.
Res Synth Methods ; 12(2): 136-147, 2021 Mar.
Article in English | MEDLINE | ID: covidwho-838918

ABSTRACT

We researchers have taken searching for information for granted for far too long. The COVID-19 pandemic shows us the boundaries of academic searching capabilities, both in terms of our know-how and of the systems we have. With hundreds of studies published daily on COVID-19, for example, we struggle to find, stay up-to-date, and synthesize information-all hampering evidence-informed decision making. This COVID-19 information crisis is indicative of the broader problem of information overloaded academic research. To improve our finding capabilities, we urgently need to improve how we search and the systems we use. We respond to Klopfenstein and Dampier (Res Syn Meth. 2020) who commented on our 2020 paper and proposed a way of improving PubMed's and Google Scholar's search functionalities. Our response puts their commentary in a larger frame and suggests how we can improve academic searching altogether. We urge that researchers need to understand that search skills require dedicated education and training. Better and more efficient searching requires an initial understanding of the different goals that define the way searching needs to be conducted. We explain the main types of searching that we academics routinely engage in; distinguishing lookup, exploratory, and systematic searching. These three types must be conducted using different search methods (heuristics) and using search systems with specific capabilities. To improve academic searching, we introduce the "Search Triangle" model emphasizing the importance of matching goals, heuristics, and systems. Further, we suggest an urgently needed agenda toward search literacy as the norm in academic research and fit-for-purpose search systems.


Subject(s)
COVID-19 , Computational Biology/methods , Information Storage and Retrieval/methods , Search Engine , Biomedical Research , Computational Biology/statistics & numerical data , Computational Biology/trends , Humans , Information Storage and Retrieval/statistics & numerical data , Information Storage and Retrieval/trends , Pandemics , PubMed , Publications , Research Personnel , SARS-CoV-2
11.
BMC Med Res Methodol ; 20(1): 235, 2020 09 21.
Article in English | MEDLINE | ID: covidwho-781441

ABSTRACT

BACKGROUND: Data analysis and visualization is an essential tool for exploring and communicating findings in medical research, especially in epidemiological surveillance. RESULTS: Data on COVID-19 diagnosed cases and mortality, from January 1st, 2020, onwards is collected automatically from the European Centre for Disease Prevention and Control (ECDC). We have developed a Shiny application for data visualization and analysis of several indicators to follow the SARS-CoV-2 epidemic using ECDC data. A country-specific tool for basic epidemiological surveillance, in an interactive and user-friendly manner. The available analyses cover time trends and projections, attack rate, population fatality rate, case fatality rate, and basic reproduction number. CONCLUSIONS: The COVID19-World online web application systematically produces daily updated country-specific data visualization and analysis of the SARS-CoV-2 epidemic worldwide. The application may help for a better understanding of the SARS-CoV-2 epidemic worldwide.


Subject(s)
Betacoronavirus/isolation & purification , Computational Biology/statistics & numerical data , Coronavirus Infections/epidemiology , Data Visualization , Pandemics , Pneumonia, Viral/epidemiology , Algorithms , Betacoronavirus/physiology , COVID-19 , Computational Biology/methods , Coronavirus Infections/transmission , Coronavirus Infections/virology , Europe/epidemiology , Global Health/statistics & numerical data , Humans , Incidence , Internet , Pneumonia, Viral/transmission , Pneumonia, Viral/virology , Population Surveillance/methods , SARS-CoV-2
SELECTION OF CITATIONS
SEARCH DETAIL